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A domain decomposition parallelization of the 
Fast Marching Method 

By M. Herrmann 


1. Motivation and objectives 

Evolving interfaces play an important role in a multitude of different areas, ranging 
from fluid mechanics, combustion, and grid generation to material sciences, semiconduc- 
tor manufacturing, seismic analysis, and control problems [see Sethian (19996) for a de- 
tailed overview]. Traditionally, interfaces have been treated in a Lagrangian framework 
tracking their evolution by, for example, marker particles [see among others Brackbill 
et al. (1988)]. In recent years, however, describing the topology and evolution of inter- 
faces by Eulerian partial differential equations (PDE) has become ever more popular 
since this approach offers certain theoretical and computational advantages over the La- 
grangian formulation (Sethian 19996). Depending on the type of problem, two different 
solution strategies for the Eulerian approach exist. In the case of an initial value prob- 
lem, level set methods (Osher & Sethian 1988), or alternatively Volume-of-Fluid methods 
(Noh & Woodward 1976), can be employed to solve the evolving interface. In the case of 
a boundary value problem, the Fast Marching Method (Sethian 1996a) has emerged as 
the efficient solution method. 

In this paper, we focus on two boundary value problems that typically arise in the 
numerical implementation of level set methods, namely reinitialization and redistribution. 
In level set methods, an iso-surface of the level set scalar G, 


G{x , t) = Go = const , 


( 1 . 1 ) 


is used to define the location of an arbitrary shaped interface T. The transport equation 
for the scalar G can then be derived from simple kinematic considerations as 

f)G 

- + u-VG = 0. (1.2) 

Since this so-called level set equation (1.2) is valid only at the location of the interface 
itself, the choice of G outside the interface, i.e. G ^ Go, is in principle arbitrary. However, 
for numerical reasons, G is generally chosen to be a distance function, 


|VG| 


= 1 . 

G#G 0 


(1.3) 


Enforcing this condition is usually called reinitialization. 

Furthermore, some quantities S may be defined only at the location of the interface. 
In order to extend these quantities to the whole computational domain, they are set 
constant in the interface normal direction, 


X7S ■ VG = 0 . (1.4) 

This procedure is called redistribution, since it redistributes the values of S from the 
interface into the surrounding domain. 
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Both, Eqs. (1.3) and (1.4) constitute boundary value problems, since G and S are 
defined on T only. Several different numerical methods exist to solve these equations. 
Among these are brute force approaches based on calculating the minimum of the ge- 
ometric distance between each grid node in the computational domain and every point 
on the interface (Merriman et al. 1994), and PDE-based approaches like the iteration 
scheme by Sussman et al. (1994). While the former are computationally very expensive, 
the latter inadvertently alter both the location of interface and the value of S on the 
interface due to their iterative nature. In the case of the reinitialization equation (1.3), 
supplementary fixes addressing this problem relatively successfully have been proposed 
(Russo & Smereka 2000; Peng et al. 1999; Sussman et al. 1998; Sussman & Fatemi 1999; 
Enright et al. 2002). 

An alternative approach to solving Eqs. (1.3) and (1.4) is the so-called Fast Marching 
Method (FMM). It was originally proposed by Tsitsiklis (1994, 1995) and applied to 
the level set formulation by Sethian (1996a) and Helmsen et al. (1996). The FMM is a 
non-iterative procedure that explicitly makes use of the way information in Eqs. (1.3) 
and (1.4) propagates. The FMM is thus theoretically optimal in its operation count. 
Still, typical problem sizes of state of the art numerical simulations in general require 
a domain decomposition approach for parallel computing. However, the FMM is highly 
sequential and hence not straightforward to parallelize in a domain decomposition sense. 
At least to the knowledge of the author, no domain decomposition parallelization of the 
Fast Marching Method has been published yet. 

This paper is structured as follows: first, the standard, sequential FMM is reviewed. 
Then, different parallelization strategies are discussed and a domain decomposition par- 
allelization is proposed. Thereafter, some preliminary results concerning the speedup of 
the proposed method are presented and discussed. Finally, conclusions are drawn and an 
outlook to future work is given. 


2. The sequential Fast Marching Method 

In this section, a short overview of the standard Fast Marching Method is given. For 
further details, the interested reader is referred to Sethian (1996a), Adalsteinsson & 
Sethian (1999), and Sethian (1999a). 

The idea of the FMM for level sets is to first solve Eq. (1.3) and use its solution to 
then solve Eq. (1.4) (Adalsteinsson & Sethian 1999). Hence, we will at first focus on the 
solution of Eq. (1.3). 

To solve Eq. (1.3) correctly, the gradient operator has to be approximated by upwind, 
entropy-satisfying finite differences (Sethian 1999a). The approximation most often used 
is due to Godunov (Rouy & Tourin 1992): 


|VG| 


[max(D-G,-D^G,0) 2 + 
max(^G,-D+^G,0) 2 + 
ma xiDT.lG-D+lG^) 2 ] 1 ' 


(2.1) 


where _Dt) a . are difference notations. For example, the first order approximation is 


D «l G = 


Gijk — Gi— 


i—ljk 


Ax 


D ijk G = 


Gi 


£+ 1 j k 


- G 


ijk 


Ax 


( 2 . 2 ) 
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(a) Calculate all node values that are directly adjacent to the interface and tag them as 
accepted. Tag all nodes adjacent to these accepted nodes as close and all others as far. 

(b) Calculate G of all close nodes by Eq. (2.3), treating G in any adjacent close or far node 
as oo. Set the loop index n = 1. 

(c) Mark as accepted the close node ijk with the smallest G value, denoted by G n = Gijk- 

(d) Mark all far nodes adjacent to Gijk as close. 

(e) Recalculate the G values of all close nodes adjacent to Gijk by Eq. (2.3), treating G in 
any adjacent close or far node as oo. 

(f) Set n = n + 1 and return to step (c) until all nodes are accepted. 

Table 1. The sequential FMM algorithm 


Thus, the discretized version of Eq. (1.3) solved in the FMM reads as 


max(DT j lG,-D+lG,0) 2 + 


m ax (D^ k G,-D+lG,Of + 

1 1/2 

max(Dr*G,-D+£G, O) 2 =1. 


(2.3) 


Provided that the G values of all nodes neighboring ijk are given, Eq. (2.3) constitutes a 
quadratic equation yielding Gijk itself. The simple, albeit inefficient way to solve Eq. (2.3) 
throughout the whole computational domain is to iteratively update each node in the 
domain by Eq. (2.3) until a stationary solution is reached (Rouy & Tourin 1992). 

However, this approach neglects to take advantage of the fact that, due to the upwind 
structure of Eq. (2.3), information propagates only from smaller to larger values of G. 
This yields attribute 1 of the FMM: 


Attribute 1 . A node value G^k is determined only by those neighboring nodes of 
smaller value. It can thus globally depend at most on those nodes in the domain that are 
of smaller value. 

Attribute 1 implies that a smallest node is fixed and cannot change its value. Hence, 
given appropriate boundary conditions for Gijk at or adjacent to the interface G = Go, 
updates of Gijk according to Eq. (2.3) can be confined to a narrow band around the 
globally smallest values that sweeps outward to ever larger values of Gijk ■ For details see 
Sethian (1996a), Sethian (1999a), and Adalsteinsson & Sethian (1999). 

In summary, this leads to the sequential FMM algorithm given in Table 1. The algo- 
rithm is executed once for all nodes G° k < Go and once for all nodes G° k > Go, where 
the superscript 0 denotes the initial values of G at node ijk. Furthermore, the following 
attribute of the Fast Marching Method can be discerned: 

Attribute 2. The sequential loop steps (c)-(f) sort all accepted nodes that are not 
initially accepted in step (a) in ascending order, i.e. G n+1 > G n . 

Using a heap sort algorithm with back pointers (Sethian 1996 b) to locate the smallest 
value G in step (c) makes the sequential FMM algorithm highly efficient with a theoretical 
operation count of 0(N log N). 
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(A m ) Perform steps (a) and (b) of the sequential FMM algorithm. 

(B m ) Send all nodes with G° jfc < Go to process #0, all nodes with G°j k > Go to process #1. 
(Cm) Perform sequential FMM algorithm steps (c)-(f). 

(E m ) Receive results for nodes G°j k < Go from process #0 and results for nodes G^*, > Go 
from process #1. 

Table 2. The parallel FMM algorithm V\ 


3. Parallelizing the Fast Marching Method 

In this section, different strategies to parallelize the FMM are discussed. First, a simple 
parallelization strategy not based on domain decomposition is given, followed by the 
discussion of several domain decomposition parallelization approaches. 

3.1. Go-based, parallelization 

The trivial way to parallelize the FMM algorithm described in table 1 is to execute 
the complete sequential algorithm for all nodes < G 0 and for all nodes G^- k > 
Gq in parallel, since neither region influences the other. The resulting algorithm P\ is 
summarized in Table 2. 

This parallelization strategy has two obvious drawbacks. First, the parallelization is 
limited to two parallel processes, since only two independent regions exist. Secondly, 
depending on the problem size, memory resource problems arise, because both processes 
need to work on the whole computational domain. 

3.2. Domain decomposition parallelization 

Domain decomposition parallelization of the sequential FMM algorithm poses two prob- 
lems. First, the globally smallest close value has to be located in step (c). Although this 
procedure is non-local by definition, it still is easy to parallelize, since each domain can 
compute its locally smallest value independently, and then simply use these to find a 
global minimum. Second, a new globally smallest value G n can be found in step (c) only 
after steps (d)-(e) of the previous loop step n — 1 have been executed. 

Nevertheless, leaving at first efficiency considerations aside, domain decomposition is 
straightforward. In order to simplify the notation in the following, we only discuss the 
domain decomposition into two neighboring domains D m and D m+ i. All arguments, how- 
ever, apply analogous to the three-dimensional domain decomposition into an arbitrary 
number of domains. Figure 1 depicts some naming conventions that are employed in 
the following. Each domain D m is extended by r ghost nodes in the domain boundary 
normal direction, where r is the spatial order of the difference approximation, Eq. (2.2). 
Here, only the first order approximation is employed. Hence, each domain is extended as 
depicted in Fig. 1. 

Assuming that each process m = 0 . . . N p works only on part T> m of the whole compu- 
tational domain, Table 3 summarizes the parallel FMM algorithm Note that V 2 has 
to be executed twice on each process, once for all nodes G ( -,- k < G 0 and once for all nodes 
G ( - jk > Go- In essence, disregarding step (A), algorithm V 2 is a domain decomposed 
sequential algorithm, because globally only one single node in a single domain is updated 
per loop step, while all other domains are waiting idle. However, algorithm V 2 introduces 
a clearly defined inter-domain communication boundary. Recalling that the update of 
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Domain T?n 


Qm 

\ 



j Domain T> m +l 



Q m+l 


Figure 1. Domain naming conventions. 


(A) Perform steps (a) and (b) of the sequential FMM algorithm. 

(B) Locate the locally smallest close G value; denote it as G-D m . 

(C) Find the globally smallest close G value, Gg = min(G-D 0 N ), and mark it as accepted. 
If Gg £ Dm, go to next step, else go to step (F). 

(D) Perform sequential FMM algorithm steps (c)-(e) for Gg- 

(E) If any of the close nodes, updated in step (e) of the sequential FMM, belong to Z3 m , 
communicate them to Gm+i of domain V m + 

(F) Set n = n + 1. Return to step (B) until all local nodes are accepted. 

Table 3. The parallel FMM algorithm V 2 


Gijk, Eq. (2.3), using first order approximations, Eq. (2.2), does involve at most the 
directly adjacent nodes in the ±i, ±j, and ±k direction, any local node in domain V m 
can be updated correctly, provided that all ghost nodes Q m = B m +i are known. Hence, 
only changes of B m + 1 need to be communicated to Q m , as is done in step (E) above. 

Taking these arguments into account, one can in fact avoid the strict sequential nature 
of algorithm V 2 ■ Looking for example at domain T> m , as long as no change in Q m occurs, 
the locally smallest close value Gx> m can be moved into a local accepted group and 
updates of the nodes adjacent to Gv m can be performed according to the sequential 
FMM algorithm steps (d) and (e). 

However, special care must be taken whenever a node belonging to Q m , for example 
G^k, changes. If Gijk changes to accepted status, then, recalling attribute 1, all locally 
accepted nodes belonging to T> m that are smaller than or equal to Gijk cannot be influ- 
enced by this change. Conversely, all locally accepted nodes belonging to T> m larger than 
Gijk might be wrong, since they could depend on Gijk ■ Hence, to allow for a consistent 
algorithm, each domain has to be able to rollback to its state at the beginning of loop 
step p, where p is such that G 23 ” 1 < G t jk < G p . The same argument applies, if G,jk 
changes from accepted to close status through a rollback operation in domain 'D m+ 1 . 
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(A) Perform steps (a) and (b) of the sequential FMM algorithm. 

(B) Check, if any node belonging to Q m changed status with new value Gb- If so, rollback to 
state S p , where p is given by G p_1 < Gb < G p . Mark Gb as close and insert it into list 
Cb- Set n = p. 

(C) Locate the locally smallest close G value (including list £g); denote it as Gx> m . 

(D) Perform steps (c)-(e) of the sequential FMM algorithm for Gu m . 

(E) If node Gp m or any of its adjacent nodes updated in step (D) belongs to B m , communicate 
them to Qm + 1 of domain T> m + 1. 

(F) Store the current state as S n . 

(G) Set n=n+l. Return to step (B) until all local nodes are accepted. 

(H) Wait until all other domains reach step (H) or a node belonging to Q m changes status. 
In the latter case, go to step (B). 

Table 4 . The parallel FMM algorithm V3 


These considerations lead to the domain decomposed parallel algorithm V3 summarized 
in Table 4. 

The drawback of algorithm V3 is two-fold. First, the complete state of the local domain 
has to be stored at every loop step in order to allow for a possible rollback to this state. 
Second, any change in status of a node belonging to Q m leads to a rollback operation in 
domain T> m . To overcome these shortcomings, we will make use of the following additional 
attribute of the FMM: 

Attribute 3. Let be the solution to Eq. (2.3) at loop step p. If any single one 
of the adjacent nodes i'j'k' becomes smaller, i.e. G v V y k , < G v V y k , with p' > p, then a 
subsequent update of node ijk by Eq. (2.3) yields 

The proof of attribute 3 is straightforward. Attribute 3 implies that any node which 
has been locally accepted at loop step p and is then rolled back to status close, can retain 
its G^- k value, because any subsequent update will either decrease its value or leave it 
unchanged. Its change back to accepted status at a later loop step is thereby uninfluenced. 
Following the same line of argument, all neighboring close nodes can also retain their 
G v V j, k i values. Thus, step (B) of V3 needs to rollback only the status of the nodes from 
accepted back to close, but does not need to rollback their node values. Furthermore, 
attribute 3 implies that only the change to accepted status of a node in Q m needs to 
initiate a rollback. 

Incorporating attribute 3, the final domain decomposed parallel algorithm V 4 is given 
in Table 5. Obviously, the efficiency of algorithm V 4 depends on the required amount of 
inter-domain communication, i.e. how often nodes belonging to B m change to accepted 
status, and how many rollback operations are required in step (B). If the solution of 
Eqs. (1.3) and (1.4) is required only up to a certain distance T away from the G = Go 
interface, then, naturally, only those nodes within this band, \G ( ) ]k — Go| < T, need to be 
considered in the FMM algorithm. Depending on the geometry of the G = Gq interface, 
this so-called narrow band approach (Peng et al. 1999) may drastically reduce the nec- 
essary inter-domain communication, as illustrated in Fig. 2. In fact, as long as there are 
no nodes ijk belonging to B rn with \G) ]k — Gq| < T, no inter-domain communication is 
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(A) Perform steps (a) and (b) of the sequential FMM algorithm. 

(B) Check, if any node belonging to Q m changed status to accepted and decreased its value 
from a previously accepted value to the new value Gb- If so, rollback by marking all nodes 
QP - n as close, where p is given by G p_1 < Gb < G p . Mark Gb as close and insert it 
into list Cb- Set n = p. 

(C) Locate the locally smallest close G value (including list Cb)', denote it as Gx> m . 

(D) Perform steps (c)-(e) of the sequential FMM algorithm for G- n m . 

(E) If node Gx> m belongs to B m , communicate it to Q m +i of domain O m +i. 

(F) Set n=n+l. Return to step (B) until all local nodes are accepted. 

(G) Wait until all other domains reach step (G) or a node belonging to Q m changes to accepted 
status. In the latter case go to step (B). 

Table 5. The parallel FMM algorithm V4 



Figure 2. Narrow band approach. 


required at all and the problem completely decouples. Although in most applications this 
is not the case, the narrow band approach can still reduce the number of inter-domain 
communications substantially. 


4. Redistribution 

The preceding sections dealt exclusively with the solution of the reinitialization equa- 
tion (1.3) as the prerequisite for solving the redistribution equation (1.4). The basic idea 
in solving Eq. (1.4) is to again confine its solution to a small band around the globally 
smallest G t jk values that marches outward to ever larger values of Gijk- 

To this end, first, initial values of S have to be calculated at all nodes directly adjacent 
to the G = G 0 interface by either first order approximations (Adalsteinsson & Sethian 
1999) or higher order schemes (Chopp 2001). Then, Eq. (1.4) is approximated by 

' ' (l>, ■ D-/ k s) + A+* (l>i'/ k G" ■ D+ls) + 

A-y (djg- ■ njs) + a+» [d+ig- ■ njs) + 

A~ z (- D~jlG n ■ D-/ k s) + A+* (D+/ k G n ■ D+tS ) = 0 . (4.1) 
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Figure 3. Optimal domain decomposition into 2, 4, and 8 domains. 


Here, the switch A x is given by 


A~ x 


i : /r;:G\u) />„;:g 

0 : otherwise , 


(4.2) 


with all other A defined accordingly. The switch A ensures that only those nodes are 
used to evaluate Sijk in Eq. (4.1) that also contributed to the update of G” /c , Eq. (2.3). 
In every FMM loop step, Eq. (4.1) is only solved for the node that changes status to 
accepted in step (c) of the sequential FMM. If this node belongs to B m , its new value S n 
has to be communicated to the corresponding ghost node in domain V m+ i in step (E) 
of the parallel FMM algorithm Vi . 


5. Results 

To evaluate the performance of the proposed parallel FMM algorithm V 4 to solve 
Eqs. (1.3) and (1.4), the results of four different sets of computations are presented in the 
following. All four sets are three-dimensional and based upon the same interface geometry, 
composed of a sphere with radius i?o = 0.25 and center located at x c = (0.5, 0.5, 0.5). 
The level set scalar field representing this sphere is given by 

G(x) = R 0 -\x- x c \ . (5.1) 

The distribution of S on the interface is set to 

S(x) = cos farctan (— — — ^ sin ( arctan ( — Xc = ] ] . (5.2) 

V \z-zJJ \ \V(y-yc ) 2 + (z-z c yJ ) 

All computations are performed on a global domain of size [0,l]x[0,l]x[0,l] discretized by 
192x192x192 equidistant cartesian grid nodes. 

5.1. Optimal domain decomposition 

As mentioned in section 1, information in Eqs. (1.3) and (1.4) propagates in the interface 
normal direction. Hence, a domain decomposition such that all domain boundaries are 
parallel to the interface normal vectors is optimal in the sense that no information does 
cross these boundaries. Figure 3 shows such a decomposition into 2, 4, and 8 domains. 
This domain decomposition is also optimal with regard to load balancing. Since all do- 
mains contain the exact same number of nodes Gjjj, < Go as well as G l:t ) z > Go, the load 
imbalance factor is F\ = 0, see Eq. (5.5) defined later. 
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Figure 4. Speedup due to parallel FMM employing the narrow band approach (left) and whole 
domain update (right) with the optimal decomposition into 1, 2, 4, and 8 domains. Dashed line 
denotes theoretically possible speedup, straight line represents attained speedup. 




Figure 5. Overhead due to parallel FMM employing the narrow band approach (left) and whole 
domain update (right) with the optimal decomposition into 1, 2, 4, and 8 domains. Shown are 
communication factor F c (■ ) and rollback factor F r (■ ). 


Two different sets of computations were performed. The first set employs the narrow 
band approach with the width of the band set to T = 8Ax. The second set performs the 
FMM throughout the whole computational domain. 

Figure 4 shows a comparison of the speedup for both cases to the theoretically pos- 
sible speedup. The theoretically possible speedup was determined by calculating only 
domain number 0, see Fig. 3, using Neumann boundary conditions. Note that, since the 
operation count of the sequential FMM is 0(iV log TV), slightly hyperlinear speedup is 
theoretically possible. However, the actual efficiency attained in the parallel computa- 
tion is approximately 0.96 for the narrow band approach and roughly 0.98 for the whole 
domain update. 

Figure 5 illustrates the overhead due to the parallelization. Depicted are the commu- 
nication factor F c , defined as 

total number of communication operations 

Fc = 


total number of nodes 


(5.3) 
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Figure 6. Non-optimal domain decomposition into 3, 9, and 27 domains. 




Figure 7. Speedup due to parallel FMM employing the narrow band approach (left) and 
whole domain update (right) with the non-optimal decomposition into 1, 3, 9, and 27 domains. 


and the rollback factor F r , defined as 

^ total number of rollback operations , 

F r = . (5.4) 

total number of nodes 

As expected, with optimal domain decomposition, the amount of communication and 
rollback operations is very small, indicating that the overhead due to parallelization is 
minimal. This result is consistent with the observed high efficiency. 

5.2. Non-optimal domain decomposition 

When applying the parallel FMM to actual problems, an optimal domain decomposition, 
as presented in the previous section, is not always viable. In this non-optimal case, three 
major factors can impact the performance of the parallel algorithm: load unbalancing, 
communication overhead, and rollback overhead. To illustrate their effect, two sets of 
computations are performed using a decomposition into 1, 3, 9, and 27 domains as de- 
picted in Fig. 6. The first set again uses the narrow band approach with the width of the 
band set to T = 8Ax, whereas the second set calculates the FMM throughout the whole 
computational domain. 

Figure 7 shows the speedup attained for both the narrow band approach on the left and 
the whole domain update on the right. As can be seen, speedup, and thus efficiency, is 
significantly lower than the optimal domain decomposition case, see Fig. 4. The efficiency 
decreases to 0.17 for the narrow band approach and 0.34 for the whole domain update. 
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Figure 8. Load imbalance factor Fi due to parallel FMM employing the narrow band approach 
(left) and whole domain update (right) with the non-optimal decomposition into 1, 3, 9, and 27 
domains. Shown is Fi for Gijk < Go ( ) and Fi for Gijk > Go (■ )■ 




Figure 9. Overhead due to parallel FMM employing the narrow band approach (left) and whole 
domain update (right) with the non-optimal decomposition into 1, 3, 9, and 27 domains. Shown 
are communication factor F c (j ) and rollback factor F r (■ ). 


To further analyze this behavior, the load imbalance factor 

p i maximum number of nodes per domain ^ 

average number of nodes per domain 

for each of the four different domain decompositions is depicted in Fig. 8. Since the 
parallel FMM algorithm is called twice, once for all nodes G^k < Gq and once for all 
nodes Gijk > Go, Fi is calculated accordingly. Obviously, the load imbalance factors for 
both sets are relatively large. This explains the significantly reduced efficiency of the 
non-optimal domain decomposition as compared to the optimal domain decomposition. 
Otherwise, the load imbalance factors for G^k > Go are roughly the same for both sets. 
However, the load imbalance factor for G^k < Gq is significantly larger in the narrow 
band approach than in the whole domain update, leading to the lower speedup and 
efficiency in the narrow band approach as compared to the whole domain update. 

Figure 9 exhibits the communication factor F c and the rollback factor F r . Compared to 
the optimal domain decomposition case (Fig. 5) both factors are one order of magnitude 
larger. Furthermore, F c and F r are significantly larger in the narrow band approach as 
opposed to the whole domain update. This is due to the aforementioned higher load 
imbalance in the narrow band approach. 
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6. Conclusions and future work 

In this paper, the first domain decomposition parallelization of the Fast Marching 
Method for level sets has been presented. Parallel speedup has been demonstrated in 
both the optimal and non-optimal domain decomposition case. The parallel performance 
of the proposed method is strongly dependent on load balancing separately the number 
of nodes on each side of the interface. A load imbalance of nodes on either side of the 
domain leads to an increase in communication and rollback operations. Furthermore, 
the amount of inter-domain communication can be reduced by aligning the inter-domain 
boundaries with the interface normal vectors. In the case of optimal load balancing and 
aligned inter-domain boundaries, the proposed parallel FMM algorithm is highly efficient, 
reaching efficiency factors of up to 0.98. 

Future work will focus on the extension of the proposed parallel algorithm to higher 
order accuracy. Also, to further enhance parallel performance, the coupling of the domain 
decomposition parallelization to the Go-based parallelization will be investigated. 
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